Introduction to GFS

Understand the requirements that led to the development of GFS and learn about its architecture.

Problems with traditional file systems#

Before Google File System (GFS), there were single-node file systems, network-attached storage (NAS) systems, and storage area networks (SAN). These file systems served well at the time they were designed. However, with the growing demands of data storage and processing needs of large distributed data-intensive applications, these systems had limitations, some of which have been discussed below.

A single-node file system is a system that runs on a single computer and manages the storage attached to it. A single server has limited resources like storage space, processing power, as well as I/O operations that can be performed on a storage disk per second. We can attach substantial storage capacity to a single server, increase the RAM, and upgrade the processor, but there are limits to this type of vertical scaling. A single server also has limitations regarding the number of data reads and writes, and how quickly data is stored and accessed. These limitations restrict the system's ability to process large datasets and serve a large number of clients simultaneously. We also, can't ignore the fact that a single-node system is a single point of failure where the system becomes unavailable to the users. The focus should be on high throughput rather than low latency for applications requiring large datasets processing.

The network-attached storage (NAS) system consists of a file-level storage server with multiple clients connected to it over a network running the network file system (NFS) protocol. Clients can store and access the files on this remote storage server like the local files. The NAS system has the same limitations as a single-node file system. Setting up and managing a NAS system is easy but expensive to scale. This system can also suffer from throughput issues while accessing large files from a single server.

File storage server
File storage server
Clients
Clients
NFS protocol
NFS pr...
Network-attached Storage (NAS)
Network-attached Storage (NAS)
Clients
Clients
Storage Area Network (SAN)
Storage Area Network (SAN)
Storage devices arrays
Storage devices ar...
Fiber channel
nework
Fiber...
Host servers
Host servers
Viewer does not support full SVG 1.1
The traditional networked file systems

The storage area network (SAN) system consists of a cluster of commodity storage devices connected to each other, providing block-level data storage to the clients over the network. SAN systems are easy to scale by adding more storage devices. However, these systems are difficult to manage because of the complexity of the second network — the Fiber Channel (FC). To set up the Fiber Channel, we need dedicated host bus adapters (HBAs) to be deployed on each host server, switches, and specialized cabling. It is difficult to monitor where failure has occurred in this complex system. Data inconsistency issues among replicas may appear in this case. Rebalancing the load on the storage devices might also be difficult to handle with this architecture.

Note: SAN deployments are special-purpose networks apart from the usual Ethernet networks. This duplicate network, while good for segregating storage traffic, is expensive in terms of dollar cost.

Given the issues above with the traditional networked file systems, Google developed a file system called GFS that mitigates the posed limitations while also providing the benefits that these traditional systems possess. It supports the increasing workload of its applications and data processing needs on commodity hardware.

GFS#

Google File System (GFS) is a distributed file system that stores and processes large amounts of distributed data-intensive applications data on a storage cluster of commodity servers. The functional and non-function requirements of GFS are listed below.

Functional requirements#

Functional requirements include the following:

  • Data storage: The system should allow users to store their data on GFS.
  • Data retrieval: The system should be able to give data back to users when they request it.

Non functional requirements#

Non functional requirements include the following:

  • Scalability: The system should be able to store an increasing amount of data (hundreds of terabytes and beyond), and handle a large number of clients concurrently.
  • Availability: A file system is one of the main building blocks of many large systems used for data storage and retrieval. The unavailability of such a system disrupts the operation of the systems that rely on it. Therefore, the file system should be able to respond to the client’s requests all the time.
  • Fault tolerance: The system’s availability and data integrity shouldn’t be compromised by component failures that are common in large clusters consisting of commodity hardware.
  • Durability: Once the system has acknowledged to the user that its data has been stored, the data shouldn’t be lost unless the user deletes the data themselves.
  • Easy operational management: The system should easily be able to store multi-GB files and beyond. It should be easy for the system to handle data replication, re-replication, garbage collection, taking snapshots, and other system-wide management activities. If some data becomes stale, there should be an easy mechanism to detect and fix it. The system should allow multiple independent tenants to use GFS for safe data storage and retrieval.
  • Performance optimization: The focus should be on high throughput rather than low latency for applications that require processing for large datasets. Additionally, Google’s applications, for which the system was being built, most often append data to the files instead of overwriting the existing data. So, the system should be optimized for append operations. For example, a logging application appends log files with new log entries. Instead of overwriting existing copies of the crawled data within a file, a web crawler appends new web crawl data to a crawl file. All MapReduce outputs write a file from beginning to end by appending key/value pairs to the file(s).
  • Relaxed consistency model: GFS does not need to comply with POSIX standards because of the unique characteristics of the use cases/applications that it targets to serve. A file system must implement a strong consistency model in order to be POSIX compatible. In POSIX, random write is one of the fundamental operations. In GFS, there are more append operations and very few random writes. That’s why GFS doesn’t comply with POSIX and provides a relaxed consistency model that is optimized for append operations. Data consistency in a distributed setting is hard, and GFS carefully opts for a custom-consistency model for better scalability and performance.

Let’s explore GFS’s architecture, which enables it to fulfill the requirements mentioned above.

Architecture#

A GFS cluster consists of two major types of components– a manager node and a large number of chunkservers. It stores a file’s data in the form of chunks. The architecture is shown in the following illustration.



GFS client
GFS client...
\file_1
\file_1
Chunk index
Chunk ind...
Chunk handle
Chunk handle
Replicas
Replicas
1
1
gf61
gf61
X, W, Z
X, W, Z
2
2
4be7
4be7
Y, X, W
Y, X, W
3
3
9b7h
9b7h
Y, W, Z
Y, W, Z
4
4
b85h
b85h
W, X, Y
W, X, Y
Data
Data
GFS manager
GFS manager
/dir1/file1
/dir1/file1
/dir1/file1
/dir1/file1
File namespace
File namespace
Application
Applic...
filename, chunk index
filename, chunk index
chunk handle,
chunk locations
chunk handle,...
chunk handle,
 byte range
chunk handle,...
GFS chunkserver W
GFS chunkserver W
Linux file system
Linux file system
GFS chunkserver X
GFS chunkserver X
Linux file system
Linux file system
GFS chunkserver Y
GFS chunkserver Y
Linux file system
Linux file system
Instructions to chunkservers
Instructions to...
Chunkservers state
Chunkservers...
Data flow
Data flow
Control flow
Control flow
Viewer does not support full SVG 1.1
Architecture of GFS
  • The client is a GFS application program interface through which the end users perform the directory or the file operations.

  • Each file is split into fixed-size chunks. The manager assigns each chunk a 64-bit globally unique ID and assigns chunkservers where the chunk is stored and replicated. A manager is like an administrator that manages the file system metadata, including namespaces, file-to-chunk mapping, and chunk location. The metadata is stored in the manager’s memory for good performance. For a persistent record of the metadata, the manager logs the metadata changes in an operation log placed on the manager’s hard disk so that it can recover its state after the restart by replaying the operation log. Besides managing metadata, the manager also handles the following tasks:
    • Data replication and rebalancing
    • Operational locks to ensure data consistency
    • Garbage collection of the deleted data

    Note: Though Google calls this manager a “master” in their research paper on GFS, we will use the term “GFS manager”, “manager node”, or simply the “manager” to refer to the same thing.

  • Chunkservers are commodity storage servers that store chunks as plain Linux files.

The client requests the manager node for metadata information, such as the location of the requested chunks. The manager looks into its metadata and responds to the client with the location of the requested chunks. The client then asks the chunkservers for the data operations. It is important to note that the data doesn't flow through the manager but directly between the client and the chunkserver.

Note: The largest GFS cluster can store up to tens of petabytes of data and can be accessed by hundreds of clients concurrently.

Points to ponder

Question 4

How can we reduce the amount of time required to replay the growing operation log to rebuild the manager state?

Hide Answer

Checkpoint the manager state when the log grows beyond a certain size. Load the latest checkpoint and replay the logs that are recorded after the last saved checkpoint.

4 of 4

Bird’s eye view#

In the coming lessons, we will design and evaluate GFS. The following concept map is a quick summary of the problem GFS solves and its novelties.

Garbage collection
Chunk lease management
Rebalancing
Re-replication
Replication
Chunk server allocation
Chunk location
File to chunk mappings
Access control information
Namespaces
Easy to add or remove machines
Cheaper hardware
Heartbeat message to manager about their current state
Perform data operations
Replication of manager
Collect chunkserver state via heartbeat protocol
Manager stores metadata in-memory
Send in response the trailing chunk locations along with the
requested chunk locations
Metadata caching at client
Use manager for file location and not for the actual file data
Failure of connectors/network/power supply
Failure of memory or server
Maintaining multiple chunkserver's state
Read/Write requests bottleneck
Simple design
Better chunk placement and replication decisions
based on global knowledge
Chunkserver supervision
Cluster-wide management
Metadata management
Generate snapshots
Read and write data to/from the files
Create, open, close, and delete files/directories
Benefits
Operations
Solution
Challenges
Benefits
Operations
Operations
Namespace management and locking
Atomic record appends
Leases and mutation order
Manager replication
Chunk replication
100 TB and beyond storage of single cluster
Atomicity and correctness
High aggregate performance
Data integrity
High availablity
Fault tolerance
Scalablity
Multiple chunk servers
with Linux File System
Single manager
Client
Relaxed consistency model
Cuncurrency and serializability
Automatic recovery
Error detection
Monitoring
Large storage capacity
A distributed file system for storing large data, with key characteristics
Architecture
Solution Space
Problem Statement
GFS

Introduction to Distributed File Systems

GFS File Operations